[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4
Open
yJader wants to merge 5 commits into
Open
[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4yJader wants to merge 5 commits into
yJader wants to merge 5 commits into
Conversation
Replace the in-tree telefuser/cache_mem cache with cacheseek as the cross-request cache middleware. - service (container/task_service/api_server): build and drive (CacheService, TeleFuserCacheAdapter); per request build_query -> lookup -> apply_resume -> on_response -> save - lingbot_world_fast: world_kv hooks (on_runtime_created / on_chunk_finalized) + decode-only fast path for exact-prefix KV reuse; enable rolling KV window (local_attn_size=7, sink_size=3) - remove legacy telefuser/cache_mem + service/cache/cache_factory| cache_service and the cache_mem unit tests - pin torch==2.7.0 + torchvision==0.22.0 - docs: update latent_cache (en/zh)
…arch-v2) arch-v2 退役了 cacheseek.core,CacheConfig 现从顶层 `cacheseek` 导出。cache 与 nocache 两个 wan22 T2V service 入口仍 import arch-v1 的 cacheseek.core.config, 导致 cacheseek approximate-reuse e2e 在服务启动期 ModuleNotFoundError 崩溃。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors TeleFuser’s latent cache feature by removing the in-tree cache_mem implementation and replacing it with an optional CacheSeek-backed integration. The integration is wired through the service container / task service, exposed via CLI flags, and documented (EN/ZH), while preserving “cache disabled by default” behavior.
Changes:
- Replace TeleFuser-local latent cache wiring with CacheSeek
(cache_service, cache_adapter)lifecycle hooks (lookup/resume/save) and fail-fast startup when enabled but CacheSeek is missing. - Add CLI/server config plumbing for
--enable-latent-cacheand--cache-mode, plus unit tests covering lazy import and failure semantics. - Update Wan2.2 service examples and LingBot World Fast runtime hooks for CacheSeek reuse, and refresh latent-cache docs (EN/ZH).
Reviewed changes
Copilot reviewed 47 out of 53 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/viewer/weight_viewer.py | Minor formatting cleanup in number formatting. |
| tools/deploy/show_stat.py | Minor formatting/quoting cleanup in output strings. |
| tools/deploy/docker_monitor.py | Minor formatting cleanup in output strings. |
| tests/unit/service/test_latent_cache_task_service.py | New test validating CacheSeek lifecycle calls from MediaGenerationService. |
| tests/unit/service/test_latent_cache_cli.py | New CLI/container tests for lazy import and fail-fast behavior. |
| tests/unit/pipelines/wan_video/test_service_examples.py | New tests ensuring service examples import without CacheSeek present. |
| tests/unit/cache_mem/test_types_and_config.py | Removed legacy cache_mem unit tests. |
| tests/unit/cache_mem/test_storage.py | Removed legacy cache_mem unit tests. |
| tests/unit/cache_mem/test_metadata.py | Removed legacy cache_mem unit tests. |
| tests/unit/cache_mem/test_concurrency.py | Removed legacy cache_mem concurrency tests. |
| tests/unit/cache_mem/init.py | Removed legacy cache_mem test package init. |
| telefuser/service/main.py | Thread latent-cache flags into server config at startup. |
| telefuser/service/core/task_service.py | Switch cache flow to CacheSeek adapter (build_query/lookup/apply_resume/on_response/save). |
| telefuser/service/core/container.py | Lazy-import CacheSeek factory; fail fast on missing/failed init; store adapter in container. |
| telefuser/service/core/config.py | Add cache_mode to ServerConfig for service-level override plumbing. |
| telefuser/service/cache/cache_service.py | Removed legacy TeleFuser cache service implementation. |
| telefuser/service/cache/cache_factory.py | Removed legacy TeleFuser cache factory implementation. |
| telefuser/service/cache/init.py | Mark legacy cache namespace as deprecated (no longer a facade). |
| telefuser/service/api/api_server.py | Forward cache_adapter into API service initialization. |
| telefuser/pipelines/lingbot_world_fast/session.py | Add optional world_kv_binding + runtime state for cached-latent fast-forward. |
| telefuser/pipelines/lingbot_world_fast/pipeline.py | Add world-KV fast-forward hook points and decode-only cached chunk path. |
| telefuser/entrypoints/cli/main.py | Add --enable-latent-cache and --cache-mode options and forward into run_server. |
| telefuser/cache_mem/vector_store/qdrant.py | Removed legacy cache_mem vector store code. |
| telefuser/cache_mem/vector_store/interfaces.py | Removed legacy cache_mem vector store code. |
| telefuser/cache_mem/vector_store/faiss.py | Removed legacy cache_mem vector store code. |
| telefuser/cache_mem/vector_store/init.py | Removed legacy cache_mem vector store exports. |
| telefuser/cache_mem/strategies.py | Removed legacy cache_mem strategy implementation/registry. |
| telefuser/cache_mem/storage/memory.py | Removed legacy cache_mem storage backend. |
| telefuser/cache_mem/storage/local_file.py | Removed legacy cache_mem storage backend. |
| telefuser/cache_mem/storage/interfaces.py | Removed legacy cache_mem storage interfaces. |
| telefuser/cache_mem/storage/fluxon.py | Removed legacy cache_mem storage stub. |
| telefuser/cache_mem/storage/init.py | Removed legacy cache_mem storage exports. |
| telefuser/cache_mem/state/interfaces.py | Removed legacy cache_mem state interfaces. |
| telefuser/cache_mem/src/models/qwen3_vl_reranker.py | Removed legacy cache_mem model code. |
| telefuser/cache_mem/src/models/qwen3_vl_embedding.py | Removed legacy cache_mem model code. |
| telefuser/cache_mem/metadata.py | Removed legacy cache_mem metadata manager. |
| telefuser/cache_mem/log_monitor.py | Removed legacy cache_mem log sink utilities. |
| telefuser/cache_mem/latent_cache.py | Removed legacy cache_mem LatentCache facade. |
| telefuser/cache_mem/encoding/interfaces.py | Removed legacy cache_mem encoder interfaces. |
| telefuser/cache_mem/encoders.py | Removed legacy cache_mem encoder wiring. |
| telefuser/cache_mem/connection.py | Removed legacy cache_mem connection manager. |
| telefuser/cache_mem/config.py | Removed legacy cache_mem config types. |
| telefuser/cache_mem/cache_types.py | Removed legacy cache_mem cache result/types. |
| telefuser/cache_mem/init.py | Removed legacy cache_mem package facade. |
| pyproject.toml | Pin torch/torchvision and update cache extra description to reflect CacheSeek usage. |
| examples/wan_video/wan22_14b_text_to_video_service.py | Update service example docs/config for CacheSeek-based lifecycle. |
| examples/wan_video/wan22_14b_text_to_video_service_nocache.py | New “no-cache” Wan2.2 service example variant. |
| docs/zh/latent_cache.md | Rewrite doc to describe CacheSeek integration and updated service flow. |
| docs/en/latent_cache.md | Rewrite doc to describe CacheSeek integration and updated service flow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
24
to
+25
| def _build_cache_task_request(task_data: dict) -> SimpleNamespace: | ||
| """Build a minimal task_request stub for the cache layer. | ||
|
|
||
| Splatting ``task_data`` directly would crash because ``TaskRequest`` is | ||
| ``extra="allow"`` and may contain keys that are not valid Python | ||
| identifiers. The cache layer only reads ``task_id`` / ``task`` / | ||
| ``prompt`` via ``getattr``, so we whitelist those. | ||
| """ | ||
| """Build a minimal task_request stub for the cache layer.""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Co-authored-by: @yx0716
Description
This PR replaces TeleFuser's in-tree latent cache implementation with an optional CacheSeek integration path. It wires CacheSeek into the service container, task service, CLI flags, Wan2.2 service examples, and LingBot World Fast world-KV hooks while keeping latent cache disabled unless explicitly requested.
Motivation
Cross-request latent/KV reuse is now owned by CacheSeek instead of TeleFuser-local
cache_memcode. TeleFuser should depend on CacheSeek only when the feature is enabled, fail clearly when CacheSeek is missing, and keep the default import/runtime path lightweight when latent cache is disabled.Type of Change
Changes Made
telefuser/cache_memimplementation and related unit tests.world_kv_bindingruntime hooks so CacheSeek exact-prefix reuse can fast-forward cached chunks.https://github.com/Tele-AI/CacheSeek.Testing
pytest tests/)Test commands:
Additional validation notes:
lookup_hit skip_step=1andsave_stored.all_pass=true.world_kv: fast-forward 1 chunks (decode-only).Checklist
ruff)pre-commit run --all-files)pytest tests/)[TYPE] Brief descriptionRelated Issues
N/A
Additional Notes
This PR is scoped to the TeleFuser-side CacheSeek adaptation. It does not include CacheSeek repository changes. The LingBot e2e command above exercises CacheSeek as an external dependency to verify that the TeleFuser
world_kv_bindinghooks are usable end to end.GPU Architecture Support
No kernel-specific code was changed. Real e2e validation ran on NVIDIA H100.
Performance Impact
No kernel-level performance change is intended. CacheSeek reuse can reduce repeated work when enabled. The LingBot exact-prefix smoke e2e showed functional reuse:
7.456s1.519s1.494s4.628sThese are smoke e2e timings on H100 and should not be treated as a formal benchmark.